Many to Many Relationships

A many to many relationship is one in which an object can be related to a set of related objects and each of those related objects can be related to a set of original objects. A good and useful example of this is the concept of tags. An object can usually have multiple tags, and each of those tags can be associated with multiple objects.

In a SQL database, objects are only connected to one another through foreign keys, which (you may recall from the previous notebook) are one to many. However, I promised you many to many relationships. To do this, we have to cheat. If we add an additional table, called a junction table to the mix, we can define a one to many relationship between our original object and the junction object and another one to many relationship between the tag object and the junction table. It sounds a bit hairy, but SQLAlchemy makes it not too bad.

Junction Tables in SQLAlchemy

Making a junction table in SQLAlchemy is not as bad as it sounds. In fact, SQLAlchemy gives us a way to quickly define tables that we don't much care about. First, let's load up the objects from before.


In [1]:
from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine ('sqlite:///:memory:', echo = False)
Base = declarative_base()

In [2]:
import numpy as np

class Star (Base):    
    __tablename__ = 'stars'
    
    id = Column (Integer, primary_key = True)

    name = Column (String, unique = True)
    ra = Column (Float)
    dec = Column (Float)
    
    def __init__ (self, *args, **kwargs):
        try:
            kwargs ["ra"] = float (kwargs ["ra"])
        except TypeError:
            kwargs ["ra"] = self.raToFloat (*(kwargs ["ra"]))
            
        try:
            kwargs ["dec"] = float (kwargs ["dec"])
        except TypeError:
            kwargs ["dec"] = self.decToFloat (*(kwargs ["dec"]))
            
        super (Star, self).__init__ (*args, **kwargs)

    @staticmethod
    def raToFloat (hours, minutes = 0.0, seconds = 0.0):
        return np.radians ((hours + minutes / 60.0 + seconds / 3600.0) * 360 / 24)
    
    @staticmethod
    def decToFloat (degrees, arcmins = 0.0, arcsecs = 0.0):
        return np.radians ((degrees + arcmins / 60.0 + arcsecs / 3600.0))
    
    def __repr__ (self):
        return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)

Now let's define our junction table. We'll do this in the "classical" style of SQLAlchemy rather than the declarative style, which you've been using up to now. While you may not recognize the style, all the components you've been using up to now are still there. The first argument of Table is the table name, then the metadata object from Base (which you passed in by inheriting from Base), then a list of columns. Note: you don't have to define the Tag object before defining the junction object. SQLAlchemy will take care of all that when we actually attempt to create the database in a moment.


In [3]:
from sqlalchemy import Table
from sqlalchemy import ForeignKey

star_tag = Table('star_tag', Base.metadata,
    Column('star_id', Integer, ForeignKey('stars.id')),
    Column('tag_id', Integer, ForeignKey('tags.id'))
)

Finally, we define our Tag object, which looks just like our other class definitions, except that we need to include the junction table in the relationship as the argument "secondary". The get method I've added is not necessary for the function of the Tag object. It's a convenience that will automatically check whether a particular tag already exists before creating a new one.


In [4]:
from sqlalchemy.orm import relationship
from sqlalchemy.orm.exc import NoResultFound

class Tag (Base):
    __tablename__ = 'tags'
    
    id = Column(Integer, primary_key = True)
    tag = Column(String(50), nullable=False, unique=True)
    
    stars = relationship ('Star', secondary = star_tag, backref = 'tags')

    @classmethod
    def get (cls, session, tag):
        try:
            return session.query (cls).filter_by (tag = tag).one ()
        except NoResultFound:
            return cls (tag = tag)
        
    def __repr__ (self):
        return "<Tag %s>" % self.tag

In [5]:
from sqlalchemy.orm import sessionmaker

Base.metadata.create_all (engine)
Session = sessionmaker (bind = engine)

Now that you've done that, I'll generate a dataset for you to play with. I've collected the 10 brightest stars in the night sky and have tagged them by spectral class, luminosity class, and whether that star has priority. Add some of your own tags.


In [6]:
# The 10 brightest stars in the sky (not the Sun), tagged according to spectral class, luminosity class, and priority

session = Session ()

sirius = Star (name = "Sirius", ra = (6, 45, 08.9), dec = (-16, 42, 58))
sirius.tags.append (Tag.get (session, "A"))
sirius.tags.append (Tag.get (session, "V"))
sirius.tags.append (Tag.get (session, "Priority"))
session.add (sirius)

canopus = Star (name = "Canopus", ra = (6, 23, 57.1), dec = (-52, 41, 45))
canopus.tags.append (Tag.get (session, "F"))
canopus.tags.append (Tag.get (session, "II"))
session.add (canopus)

rigil = Star (name = "Rigil Kentaurus", ra = (14, 39, 35.9), dec = (-60, 50, 7))
rigil.tags.append (Tag.get (session, "G"))
rigil.tags.append (Tag.get (session, "V"))
session.add (rigil)

arcturus = Star (name = "Arcturus", ra = (14, 15, 39.7), dec = (19, 10, 57))
arcturus.tags.append (Tag.get (session, "K"))
arcturus.tags.append (Tag.get (session, "III"))
arcturus.tags.append (Tag.get (session, "Priority"))
session.add (arcturus)

vega = Star (name = "Vega", ra = (18, 36, 56.3), dec = (38, 47, 1))
vega.tags.append (Tag.get (session, "A"))
vega.tags.append (Tag.get (session, "V"))
vega.tags.append (Tag.get (session, "Priority"))
session.add (vega)

capella = Star (name = "Capella", ra = (5, 16, 41.4), dec = (45, 59, 53))
capella.tags.append (Tag.get (session, "G"))
capella.tags.append (Tag.get (session, "III"))
session.add (capella)

rigel = Star (name = "Rigel", ra = (5, 14, 32.3), dec = (-8, 12, 6))
rigel.tags.append (Tag.get (session, "B"))
rigel.tags.append (Tag.get (session, "Ia"))
session.add (rigel)

procyon = Star (name = "Procyon", ra = (7, 39, 18.1), dec = (5, 13, 30))
procyon.tags.append (Tag.get (session, "F"))
procyon.tags.append (Tag.get (session, "IV"))
session.add (procyon)

achernar = Star (name = "Achernar", ra = (1, 37, 42.9), dec = (-57, 14, 12))
achernar.tags.append (Tag.get (session, "B"))
achernar.tags.append (Tag.get (session, "V"))
session.add (achernar)

betelgeuse = Star (name = "Betelgeuse", ra = (5, 55, 10.3), dec = (7, 24, 25))
betelgeuse.tags.append (Tag.get (session, "M"))
betelgeuse.tags.append (Tag.get (session, "Ia"))
betelgeuse.tags.append (Tag.get (session, "Priority"))
session.add (betelgeuse)

session.commit ()

In [7]:

Querying with Many to Many Relationships

Of course, what good is tagging these stars unless we know how to query against the database? All of the querying tools from one to many relationships still apply, but we can add a few more tools to the mix. In particular, the many form of the has method: the contains method.


In [7]:
session.query (Star).filter (Star.tags.contains (Tag.get (session, "Priority"))).all ()


Out[7]:
[<Star Object Sirius at (RA=1.767793, DEC=-0.266754)>,
 <Star Object Arcturus at (RA=3.733528, DEC=0.334798)>,
 <Star Object Vega at (RA=4.873563, DEC=0.676902)>,
 <Star Object Betelgeuse at (RA=1.549729, DEC=0.129276)>]

Once you gain some confidence with the SQLAlchemy commands, you can do some fancy querying. Below is an example of the use of a subquery. In effect, this is like generating an intermediate table that can be filled with derived quantities which can then be filtered appropriately. For example, the below commands return any tags with more than 3 associated stars. For your first time through, don't worry about reading the following break down, just take a look at the result and continue on to the next notebook here.

The break down is as follows: we'd like the subquery to return a table with two columns, a tag object and the count of how many stars are in each tag. The func.count ('*') command informs SQLAlchemy that we actually want a count of all the contained objects rather than a list. We label the count column as "count" for later use. With just that, we'd actually only get the total number of tags. We need to specify that we want to look at the contained Star objects, which we do with a join command. This joins the Tag table with the Star table; however, since Tag does not have an explicit Foreign Key to Star, we need to point it toward the relationship between the two in the second argument of join.

Now, this would return the number of unique links between the Star and Tag tables, which is almost what we want. To get it to be the number of stars in a particular tag, we need to group_by that tag. Now, we get a table that looks like our original goal. We need to filter the results of this query, so we use a subquery.

We now want to query the table of Tags produced by the subquery. If we try query (Tag), we query the table associated with the Tag object. To query the table associated with the subquery, we need to alias the Tag object with the subquery. We then query on the alias, and filter by the count column from the subquery.

That was a lot of explanation, but the key advantage is you can do all that in three relatively simple lines of SQLAlchemy.


In [8]:
from sqlalchemy.orm import aliased
from sqlalchemy import func

stmt = session.query (Tag, func.count ('*').label ("count")).join (Star, Tag.stars).group_by (Tag).subquery ()
subtag = aliased (Tag, stmt)
# print (stmt.all ())
session.query (subtag).filter (stmt.c.count > 3).all ()


Out[8]:
[<Tag V>, <Tag Priority>]

In [8]: